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Abstract 

In this paper we study functions with low influences on product probability spaces. These 
are functions / : Qi x • • • X Q n — > M that have E[VarnJ/]] small compared to Var[/] for each 
i. The analysis of boolean functions / : { — 1, l} 11 —> { — 1,1} with low influences has become a 
central problem in discrete Fourier analysis. It is motivated by fundamental questions arising 
from the construction of probabilistically checkable proofs in theoretical computer science and 
from problems in the theory of social choice in economics. 

We prove an invariance principle for multilinear polynomials with low influences and bounded 
degree; it shows that under mild conditions the distribution of such polynomials is essentially 
invariant for all product spaces. Ours is one of the very few known non-linear invariance princi¬ 
ples. It has the advantage that its proof is simple and that the error bounds are explicit. We also 
show that the assumption of bounded degree can be eliminated if the polynomials are slightly 
“smoothed”; this extension is essential for our applications to “noise stability”-type problems. 

In particular, as applications of the invariance principle we prove two conjectures: the “Ma¬ 
jority Is Stablest” conjecture sn. from theoretical computer science, which was the original 
motivation for this work, and the “It Ain’t Over Till It’s Over” conjecture m from social 
choice theory. 


1 Introduction 

1.1 Harmonic analysis of boolean functions 

The motivation for this paper is the study of boolean functions f : {—l,l} n —> {—1,1}, where 
{—1, l} n is equipped with the uniform probability measure. This topic is of significant interest in 
theoretical computer science; it also arises in other diverse areas of mathematics including combi¬ 
natorics (e.g., sizes of set systems, additive combinatorics), economics (e.g., social choice), metric 
spaces (e.g., non-embeddability of metrics), geometry in Gaussian space (e.g., isoperimetric in¬ 
equalities), and statistical physics (e.g., percolation, spin glasses). 

Beginning with Kahn, Kalai, and Linial’s landmark paper “The Influence Of Variables On 
Boolean Functions” m there has been much success in analyzing questions about boolean functions 
using methods of harmonic analysis. Recall that KKL essentially shows the following (see also ; ITT 
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KKL Theorem: If / : {— 1, l} n —► {—1,1} satisfies E[/] = 0 and Infj(/) < r for all i, then 
Er=iI^(/)>^(log(l/r)). 

We have used here the notation Inf,;(/) for the influence of the ith coordinate on /, 

Infj(/) = E[Var [/(*)]] = £ f(S) 2 . (1) 

S3i 

Although an intuitive understanding of the analytic properties of boolean functions is emerging, 
results in this area have used increasingly elaborate methods, combining random restriction argu¬ 
ments, applications of the Bonami-Beckner inequality, and classical tools from probability theory. 

See for example mmmmmmmmm- 

As in the KKL paper, some of the more refined problems studied in recent years have involved 
restricting attention to functions with low influences umm (or, relatedly, “non-juntas”). There 
are several reasons for this. The first is that large-influence functions such as “dictators” — i.e., 
functions f(x i,... ,x n ) = — frequently trivially maximize or minimize quantities studied in 

boolean analysis. However this tends to obscure the truth about extremal behaviors among func¬ 
tions that are “genuinely” functions of n bits. Another reason for analyzing only low-influence 
functions is that this subclass is often precisely what is interesting or necessary for applications. 
In particular, the analysis of low-influence boolean functions is crucial for proving hardness of ap¬ 
proximation results in theoretical computer science and is also very natural for the study of social 
choice. Let us describe these two settings briefly. 

In the economic theory of social choice, boolean functions / : {— l,l} n —> { — 1,1} often rep¬ 
resent voting schemes, mapping n votes between two candidates into a winner. In this case, it 
is very natural to exclude voting schemes that give any voter an undue amount of influence; see 
e.g. |E|. In the study of hardness of approximation and probabilistically checkable proofs (PCPs), 
the sharpest results often involve the following paradigm: One considers a problem that requires 
labeling the vertices of a graph using the label set [n]; then one relaxes this to the problem of 
labeling the vertices by functions / : {—l,l} n —> {—1,1}- In the relaxation one thinks of / as 
“weakly labeling” a vertex by the set of coordinates that have large influence on /. It then be¬ 
comes important to understand the combinatorial properties of functions that weakly label with 
the empty set. There are by now quite a few results in hardness of approximation that use results 
on low-influence functions or require conjectured such results; e.g., [nnsuEHMiEni. 

In this paper we give a new framework for studying functions on product probability spaces with 
low influences. Our main tool is a simple invariance principle for low-influence polynomials; this 
theorem lets us take an optimization problem for functions on one product space and pass freely 
to other product spaces, such as Gaussian space. In these other settings the problem sometimes 
becomes simpler to solve. It is interesting to note that while in the theory of hypercontractivity 
and isoperimetry it is common to prove results in the Gaussian setting by proving them first in the 
{—1, l} n setting (see, e.g., f2]), here the invariance principle is actually used to go the other way 
around. 

As applications of our invariance principle we prove two previously unconnected conjectures 
from boolean harmonic analysis; the first was motivated by hardness of approximation in computer 
science, the second by the theory of social choice from economics: 
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Conjecture 1.1 (“Majority Is Stablest” conjecture [55]) Let 0 < p < 1 and e > 0 be given. 
Then there exists r > 0 such that if f : {—1, l} n —► [—1,1] satisfies E[/] = 0 and Inf fif) < r for 
all i, then 

§p(/) < f arcsin p + e. 

Here we have used the notation § p (/) for Yls /^^/(S 1 ) 2 , the noise stability of /. This equals 
E [f( x )f(y)] when (x,y) G {—1, l} n x {—1, l} n is chosen so that ( Xi,yi ) G {—1, l} 2 are independent 
random variables with E \xj\ = E [yj\ = 0 and E[.x'py ? ] = p. 

Conjecture 1.2 (“It Ain’t Over Till It’s Over” conjecture [55]) Let 0 < p < 1 and e > 0 

6e given. Then there exists 5 > 0 and t > 0 such that if f : {—1, l} n —> {—1,1} satisfies E[/] = 0 
and Inf ? ;(/) < r /or all i, then f has the following property: IfV is a random subset of [n] in which 
each i is included independently with probability p, and if the bits (xi)t e v are chosen uniformly at 
random, then 



(In words, the conjecture states that even if a random p fraction of voters’ votes are revealed, with 
high probability the election is still slightly undecided, provided / has low influences.) 

The truth of these results gives illustration to a recurring theme in the harmonic analysis of 
boolean functions: the extremal role played the Majority function. It seems this theme becomes 
especially prominent when low-influence functions are studied. To explain the connection of Ma¬ 
jority to our applications: In the former case the quantity ^ arcsin p is precisely lim n _>oo § p (Maj n ); 
this explains the name of the Majority Is Stablest conjecture. In the latter case, we show that 5 can 
be taken to be on the order of e p ^ 1 ~ p ' ) (up to o(l) in the exponent), which is the same asymptotics 
one gets if / is Majority on a large number of inputs. 

1.2 Outline of the paper 

We begin in Section [21 with an overview of the invariance principle, the two applications, and some 
of their consequences. We prove the invariance principle in Section |5J Our proofs of the two 
conjectures are in Section 0J Finally, we show in Section 0 that a conjecture closely related to 
Majority Is Stablest is false. Some minor proofs from throughout the paper appear in appendices. 

1.3 Related work 

Our multilinear invariance principle has some antecedents. For degree 1 polynomials it reduces to 
a version of the Berry-Esseen Central Limit Theorems. Indeed, our proof follows the same outlines 
as Lindeberg’s proof of the CLT 22] (see also EH). 

Since presenting our proof of the invariance principle, we have been informed by Oded Regev 
that related results were proved in the past by V. I. Rotar' ESI- As well, a contemporary manuscript 
of Sourav Chatterjee ESI with an invariance principle of similar flavor has come to our attention. 
What is common to our work and to mm is a generalization of Lindeberg’s argument to the 
non-linear case. The result of Rotar' is an invariance principle similar to ours where the condition 
on the influences generalizes Lindeberg’s condition. The setup is not quite the same, however, and 
the proof in m is of a rather qualitative nature. It seems that even after appropriate modification 
the bounds it gives would be weaker and less useful for our type of applications. (This is quite 
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understandable; in a similar way Lindeberg’s CLT can be less precise than the Berry-Esseen in¬ 
equality even though — indeed, because — it works under weaker assumptions.) The paper m is 
by contrast very clear and explicit. However it does not seem to be appropriate for many applica¬ 
tions since it requires low “worst-case” influences, instead of the “average-case” influences used by 
this work and 

Finally, we would like to mention that some chaos-decomposition limit theorems have been 
proved before in various settings. Among these are limit theorems for U and V statistics and limit 
theorems for random graphs; see, e.g. m- 

1.4 Acknowledgments 

We are grateful to Keith Ball for suggesting a collaboration among the authors. We would also like 
to thank Oded Regev for referring us to 051 and Olivier Guedon for referring us to m- 

2 Our results 

2.1 The invariance principle 

In this subsection we present a simplified version of our invariance principle. 

Suppose A is a random variable with E[A] = 0 and E[A 2 ] = 1 and X \,..., X n are independent 
copies of X. Let Q(x i,..., x n ) = E?=i be a linear form and assume E cf = 1. The Berry- 
Esseen CLT states that under mild conditions on the distribution of X , say E[|X| 3 ] < A < oo, it 
holds that 

sup|P[Q(A!,..., X n ) < t] - P[G < t]\ < 0{A • EILiM 3 ); 
where G denotes a standard normal random variable. Note that a simple corollary of the above is 
sup|P[Q(A!,..., X n ) < t] - P[Q(Gi,..., G n ) < t]\ < 0(A- max \a\) . (2) 

t 1 

Here the Gi s denote independent standard normals. We have upper-bounded the sum of |cj| 3 
here by a maximum, for simplicity; more importantly though, we have suggestively replaced G by 
E i c-iGi , which of course has the same distribution. 

We would like to generalize © to multilinear polynomials in the Xj’s; i.e., functions of the form 

Q(X 1 ,...,X n )= J2 csUXi, (3) 

S'C[n] i€S 

where the real constants cs satisfy E c s = 1- Let d = max Cs ^o |5'| denote the degree of Q. Un¬ 
like in the d = 1 case of the CLT, there is no single random variable G which always provides a 
limiting distribution. However one can still hope to prove, in light of ©, that the distribution of 
the polynomial applied to the variables X; is close to the distribution of the polynomial applied to 
independent Gaussian random variables. This is indeed what our invariance principle shows. 

It turns out that the appropriate generalization of the Berry-Esseen theorem 0 is to control 
the error by a function of d and of max* Esai c s — ^- e -i the maximum of the influences of Q (as 
in ©)• Naturally, we also need some conditions in addition to second moments. In our formulation 
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we impose the condition that the variable X is hypercontractive ; i.e., there is some tj > 0 such that 
for all a E R, 

||a + r]X || 3 < ||a + X\\ 2 . 

This condition is satisfied whenever E[JA] = 0 and E[|W| 3 ] < oo; in particular, it holds for any 
mean-zero random variable X taking on only finitely many values. Using hypercontractivity, we 
get a simply proved invariance principle with explicit error bounds. The following theorem (a 
simplification of Theorem KTTS1 bound m) is an example of what we prove: 

Theorem 2.1 Let X \,..., X n be independent random variables satisfying E [Xi] = 0, E[Xf] = l, 
and E[|X.j| 3 ] < j3. Let Q be a degree d multilinear polynomial as in m with 

^2 °s = i i ^2 c s - t f° r al1 i - 

|S|>0 S3i 

Then 

supjPlQ^!, ..., X n ) < t] - P[Q(G 1 , • ■ ■, G n ) < t]\ < Oidp^T 1 ^), 
where G i,... ,G n are independent standard Gaussians. 

If, instead of assuming E[|Xj| 3 ] < (3, we assume that each Xi takes only on finitely many values, 
and that for all i and all x 6l either P[Wj = x\ = 0 or P [Xi = x] > a, then 

sup|P[g(X l5 ..., X n ) < t] - P[Q(Gi, • • •, G n ) < i]| < 0(da~ l ' G r 1 / 8d ). 

t 

Note that if d, (3, and a are fixed then the above bound tends to 0 with r. We call this theorem an 
“invariance principle” because it shows that Q{X \,..., X n ) has essentially the same distribution no 
matter what the Xfs are. Usually we will not push for the optimal constants; instead we will try to 
keep our approach as simple as possible while still giving explicit bounds useful for our applications. 

An unavoidable deficiency of this sort of invariance principle is the dependence on d in the error 
bound. In applications such as Majority Is Stablest and It Ain’t Over Till It’s Over, the functions 
/ may well have arbitrarily large degree. To overcome this, we introduce a supplement to the 
invariance principle: We show that if the polynomial Q is “smoothed” slightly then the dependence 
on d in the error bound can be eliminated and replaced with a dependence on the smoothness. For 
“noise stability”-type problems such as ours, this smoothing is essentially harmless. 


In fact, the techniques we use are strong enough to obtain Berry-Esseen estimates under 
Lyapunov-type assumptions. In particular, we believe that the following theorem is new even 
in the case of sums of independent random variables. 

Theorem 2.2 Let q E (2,3]. Let X\,..., X n be independent random variables satisfying E[Jfj] = 
0, E[Xf] = l, and E[|Xi|®] < [3. Let Q be a degree d multilinear polynomial as in m with 

^2 ° 2 s = 1 > c s - T f° r al1 L 

|Sj>0 SBi 

Then 

su p|P[g(-Xi, P[Q(Gi,. ■ •, G n ) < i]| < 

cly/^J+i) < o(dpi^+ 1 T 2 ^ d + 2 ), 

i 

where G \,..., G n are independent standard Gaussians. 
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2.2 Influences and noise stability in product spaces 

Our proofs of the Majority Is Stablest and It Ain’t Over Till It’s Over conjectures hold not just for 
functions on the uniform-distribution discrete cube, but for functions on arbitrary finite product 
probability spaces. Harmonic analysis results on influences have often considered the biased prod¬ 
uct distribution on the discrete cube (see, e.g., mmmm) ; and, some recent works involving 
influences and noise stability have considered functions on product sets [ q] n endowed with the uni¬ 
form distribution (e.g., .D 153] ). But since there doesn’t appear to be a unified treatment for the 
general case in the literature, we give the necessary definitions here. 

Let (fii, hi), ..., (f2 n , n n ) be probability spaces and let (O,//) denote the product probability 
space. Let 

/ : 111 x ■ ■ ■ x O n —> K 

be any real-valued function on 17. 

Definition 2.3 The influence of the ith coordinate on f is 

In fi(/) = E[Var[/]]. 

Note that for boolean functions / : { —l,l} n —> {—1,1} this agrees with the classical notion of 
influences introduced to computer science by Ben-Or and Linial 0. When the domain {—l,l} n 
has a p-biased distribution, our notion differs from that of, say, m by a multiplicative factor of 
4p(l — p). We believe the above definition is more natural, and in any case it is easy to pass between 
the two. 

To define noise stability, we first define the T p operator on the space of functions /: 

Definition 2.4 For any 0 < p < 1, the operator T p is defined by 

(T p f)(u>i,... ,u n ) = E[/(o; / 1 ,... fUj'n)], (4) 

where each is an independent random variable defined to equal Ui with probability p and to be 
randomly drawn from pi with probability 1 — p. 

We remark that this definition agrees with that of the “Bonami-Beckner operator” introduced 
in the context of boolean functions by KKL |30j and also with its generalization to [q] n from 551 . 
For more on this operator, see Wolff m- With this definition in place, we can define noise stability: 

Definition 2.5 The noise stability of / at p E [0,1] is 

§/>(/) = E[/ • T p f]. 

For the It Ain’t Over Till It’s Over problem, we introduce a new operator V p : 

Definition 2.6 For any p E [0,1], the operator V p is defined as follows. The operator takes a 
function / : 12 1 x ■ • • x 42 n —> R to a function g : f2i x • ■ ■ x Q n x {0, l} n —> R, where {0, l} n is 
equipped with the (1 — p,p)® n measure. It is defined as follows: 

(V p f)(ui,... ,u> n ,xi,... ,x n ) = E [/ (xi^i + (1 - xi)u[,.. .,x n u n + (1 - x n )u>' n )\ . 

UJ 1 
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Finally, we would like to note that our definitions are valid for functions / into the reals, although 
our motivation is usually {—1, l}-valued functions. Our proofs of the Majority Is Stablest and It 
Ain’t Over Till It’s Over conjectures will hold in the setting of functions / : Oi x • • • x [-1,1] 

(note that Conjecture 11.11 requires this generalized range). For notational simplicity, though, we 
will give our proofs for functions into [0,1]; the reader can easily convert such results to the [—1,1] 
case by the linear transformation / i—> 2/ — 1, which interacts in a simple way with the definitions 
of Infj, § p and V p . 


2.3 Majority Is Stablest 
2.3.1 About the problem 

The Majority Is Stablest conjecture, Conjecture 11.11 was first formally stated in EH- However 
the notion of Hamming balls having the highest noise stability in various senses has been widely 
spread among the community studying discrete Fourier analysis. Indeed, already in KKL’s 1998 
paper m there is the suggestion that Hamming balls and subcubes should maximize a certain 
noise stability-like quantity. In [2], it was shown that every ‘asymptotically noise stable” function 
is correlated with a weighted majority function; also, in m it was shown that the majority function 
asymptotically maximizes a high-norm analog of 8> p . 

More concretely, strong motivation for getting sharp bounds on the noise stability of low- 
influence functions came from two 2002 papers, one by Kalai eh on social choice and one by 
Khot |2I! on PCPs and hardness of approximation. We briefly discuss these two papers below. 


Kalai ’02 — Arrow’s Impossibility Theorem: Suppose n voters rank three candidates, A, 
B , and C, and a social choice function / : { — 1, l} n —> {—1,1} is used to aggregate the rankings, as 
follows: / is applied to the n A-vs.-B preferences to determine whether A or B is globally preferred; 
then the same happens for A-vs.-C and B-vs.-C. The outcome is termed “non-rational” if the global 
ranking has A preferable to B preferable to C preferable to A (or if the other cyclic possibility 
occurs). Arrow’s Impossibility Theorem from the theory of social choice states that under some 
mild restrictions on / (such as / being odd; i.e., /(— x) = —f(x)), the only functions which never 
admit non-rational outcomes given rational voters are the dictator functions f(x) = ztccj. 

Kalai EH studied the probability of a rational outcome given that the n voters vote indepen¬ 
dently and at random from the 6 possible rational rankings. He showed that the probability of 
a rational outcome in this case is precisely 3/4 + (3/4)8 1 / 3 (/). Thus it is natural to ask which 
function / with small influences is most likely to produce a rational outcome. Instead of consid¬ 
ering small influences, Kalai considered the essentially stronger assumption that / is “transitive- 
symmetric”; i.e., that for all 1 < i < j < n there exists a permutation er on [n] with a{i) = j 
such that f(x i,...,x n ) = /(a^m, • • •, x ff ( n )) for all (x±,..., x n ). Kalai conjectured that Major¬ 
ity was the transitive-symmetric function that maximized 3/4 + (3/4)S 1 / 3 (/) (in fact, he made a 
stronger conjecture, but this conjecture is false; see Section [5]). He further observed that this would 
imply that in any transitive-symmetric scheme the probability of a rational outcome is at most 
3/4 + (3/27r) arcsin(l/3) + o n (l) ~ .9123; however, Kalai could only prove the weaker bound .9192. 

Khot ’02 — Unique Games and hardness of approximating 2-CSPs: In computer science, 
many combinatorial optimization problems are NP-hard, meaning it is unlikely there are efficient 
algorithms that always find the optimal solution. Hence there has been extensive interest in under¬ 
standing the complexity of approximating the optimal solution. Consider for example “fc-variable 
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constraint satisfaction problems” (fc-CSPs) in which the input is a set of variables over a finite 
domain, along with some constraints on fc-sets of the variables, restricting what sets of values they 
can simultaneously take. We say a problem has “(c, s)-hardness” if it is NP-hard, given a k-CSF 
instance in which the optimal assignment satisfies a c-fraction of the constrains, for an algorithm 
to find an assignment that satisfies an s-fraction of the constraints. In this case we also say that 
the problem is “s/ohard to approximate”. 

The PCP and Parallel Repetition theorems have led to many impressive results showing that it 
is NP-hard even to give a-approximations for various problems, especially fc-CSPs for k > 3. For 
example, letting MAX-fcLIN(g) denote the problem of satisfying k- variable linear equations over 
Z q , it is known m that MAX-/cLIN((/) has (1 — e, 1/q + e)-hardness for all k > 3, and this is sharp. 
However it seems that current PCP theorems are not strong enough to give sharp hardness of 
approximation results for 2-CSPs (e.g., constraint satisfaction problems on graphs). The influential 
paper of Khot m introduced the “Unique Games Conjecture” (UGC) in order to make progress 
on 2-CSPs; UGC states that a certain 2-CSP over a large domain has (1 — e, e)-hardness. 

Interestingly, it seems that using UGC to prove hardness results for other 2-CSPs typically 
crucially requires strong results about influences and noise stability of boolean functions. For 
example, El’s analysis of MAX-2LIN(2) required an upper bound on §i_ e (/) for small e among 
balanced functions / : {—1, l} n —> {—1,1} with small influences; to get this, Khot used the following 
deep result of Bourgain m from 2001: 

Theorem 2.7 (Bourgain [IT]) If f : {— 1, l} n —> {—1,1} satisfies E[/] = 0 and Inf,//) < 10 _d 
for all i 6 [n], then 

f(S) 2 > i/ 2 - °(a/ Io s 1o s d / lo s d ) = 

|S|>d 

Note that Bourgain’s theorem has the following easy corollary: 

Corollary 2.8 If f : {—1, l} n —» {—1,1} satisfies E[/] = 0 and Inf,//) < 2~°( 1 / e ' ) for all i G [n] , 
then 

Si_ e (/) < 1 - e 1/2+o(1) . 

This corollary enabled Khot to show (1 — e, 1 — e 1 / 2 +°( 1 ) )-hardness for MAX-2LIN(2), which is close 
to sharp (the algorithm of Goemans-Williamson j27j achieves 1 — 0(yfe)). As an aside, we note 
that Khot and Vishnoi recently used Corollary 12.81 to prove that negative type metrics do not 
embed into I\ with constant distortion. 

Another example of this comes from the work of ESI- Among other things, M studied the 
MAX-CUT problem: Given an undirected graph, partition the vertices into two parts so as to 
maximize the number of edges with endpoints in different parts. The paper introduced the Majority 
Is Stablest Coniecture ll .H and showed that together with UGC it implied {\ + \p— e, \ + -^ arcsinp+ 
e)-hardness for MAX-CUT. In particular, optimizing over p (taking p ~ .69) implies MAX-CUT is 
.878-hard to approximate, matching the groundbreaking algorithm of Goemans and Williamson EH- 

2.3.2 Consequences of confirming the conjecture 

In Theorem roi we confirm a generalization of the Majority Is Stablest conjecture. We give a 
slightly simplified statement of this theorem here: 

Theorem roi Let / : Hi x • • • x H n —> [0,1] be a function on a discrete product probability space 
and assume that for each i the minimum probability of any atom in is at least a < 1/2. Further 



assume that Inf,;(/) < t for all i. Let p = E[/]. Then for any 0 < p < 1, 

S Af) < s„(ThrM) + o(+^+), 

where Thq/'* : {—1, l} n —» {0,1} denotes the symmetric threshold function with expectation closest 
to p, and the O(-) hides a constant depending only on a and 1 — p. 

We now give some consequences of this theorem: 

Theorem 2.9 In the terminology of Kalai ra any odd, balanced social choice function f with 
either 

• o n ( 1) influences or 

• such that f is transitive 

has probability at most 3/4+ (3/27r) arcsin(l/3) + o n (l) ~ .9123 of producing a rational outcome. 
The majority function on n inputs achieves this bound, 3/4 + (3/27r) arcsin(l/3) + o n (l). 

By looking at the series expansion of - arcsin(l — e) we obtain the following strengthening of 
Corollary 12.81 

Corollary 2.10 If f : {— 1, l} n —> {—1,1} satisfies E[/] = 0 and Infj(/) < e^ 0<yl ^ e ^ for all i E [n], 
then 

Sl-.(/) < 1 - (# - o(l))£ 1/2 . 

Usine Corollary 12 . 101 instead of Corollarv 12.81 in Khot m we obtain 

Corollary 2.11 MAX-2LIN(2) and MAX-2SAT have (1 — e, 1 — 0(e 1//2 )) -hardness. 

More generally, [32. now implies 

Corollary 2.12 MAX-CUT has (| + \p — e, \ + ^ arcsinp + e)-hardness for each p and all e > 0, 
assuming UGC only. In particular, the Goemans- Williams on .878-approximation algorithm is best 
possible, assuming UGC only. 

The following two results are consequences of a generalization of “Majority is Stablest” as shown 
in 1321 : 

Theorem 2.13 UGC implies that for each e > 0 there exists q = q(e ) such that MAX-2LIN(q) has 
(1 — e, e) -hardness. Indeed, this statement is equivalent to UGC. 

Theorem 2.14 The MAX-q-CUT problem, i.e. Approximate q-Coloring, has (1 — 1/ q + q 2+ °^)- 
hardness factor, assuming UGC only. This asymptotically matches the approximation factor ob¬ 
tained by Frieze and Jerrum 
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2.4 It Ain’t Over Till It’s Over 


The It Ain’t Over Till It’s Over conjecture was originally made by Kalai and Friedgut m in study¬ 
ing social indeterminacy ESI EH- The setting here is similar to the setting of Arrow’s Theorem 
from Section I2.ll.ll except that there are an arbitrary finite number of candidates. Let R denote 
the (asymmetric) relation given on the candidates when the monotone social choice function / is 
used. Kalai showed that if / has small influences, then the It Ain’t Over Till It’s Over Conjecture 
implies that every possible relation R is achieved with probability bounded away from 0. Since its 
introduction in 2001, the It Ain’t Over Till It’s Over problem has circulated widely in the commu¬ 
nity studying harmonic analysis of boolean functions. The conjecture was given as one of the top 
unsolved problems in the field at a workshop at Yale in late 2004. 

In Theorem m we confirm the It Ain’t Over Till It’s Over conjecture and generalize it to 
functions on arbitrary finite product probability spaces with means bounded away from 0 and 1 . 
Further, the asymptotics we give show that symmetric threshold functions (e.g., Majority in the 
case of mean 1/2) are the “worst” examples. We give a slightly simplified statement of Theorem 14.01 
here: 

Theorem roi Let 0 < p < 1 and let f : Pi x • ■ ■ X P n —> [0,1] be a function on a discrete product 
probability space; assume that for each i the minimum probability of any atom in f U is at least 
a < 1/2. Then there exists e(p, p) > 0 such that if e < e(p,p) and Infj(/) < g^V 10 ^ 1 / 6 )) for all i 
and p = E[/] then 

P[V p f>l-6]<e 

and 

P [V p f <5}<e 

provided 

where the O(-) hides a constant depending only on a. 1 — p, p, and 1 — p. 


3 The invariance principle 

3.1 Setup and notation 

In this section we will describe the setup and notation necessary for our invariance principle. Recall 
that we are interested in functions on finite product probability spaces, / : Pi x • • • x P n —> M. For 
each i, the space of all functions P* —> R can be expressed as the span of a finite set of orthonormal 
random variables, X, q = 1, X^i, X 7 2 , A/ 3 ,...; then / can be written as a multilinear polynomial 
in the A/j’s. In fact, it will be convenient for us to mostly disregard the Pi’s and work directly with 
sets of orthonormal random variables; in this case, we can even drop the restriction of finiteness. 
We thus begin with the following definition: 

Definition 3.1 We call a collection of finitely many orthonormal real random variables, one of 
which is the constant 1, an orthonormal ensemble. We will write a typical sequence of n or¬ 
thonormal ensembles as X = {X\, ..., X n ), where X t = {A/jj = 1, A/ ; i,..., A/ ;mi }. We call a 
sequence of orthonormal ensembles X independent if the ensembles are independent families of 
random variables. 
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We will henceforth be concerned only with independent sequences of orthonormal ensembles, 
and we will call these sequences of ensembles, for brevity. 

Remark 3.2 Given a sequence of independent random variables X\,..., X n with E[X,] = 0 and 
E[X?] = 1 (as in Theorem \2.1\) . we can view them as a sequence of ensembles X by renaming 
Xj = X it i and setting X i 0 = 1 as required. 

Definition 3.3 We denote by Q the Gaussian sequence of ensembles, in which Qi = {G il0 = 
1, Gi t i, Gi t 2 ,... } and all G,^ ’s with j > 1 are independent standard Gaussians. 

As mentioned, we will be interested in multilinear polynomials over sequences of ensembles. 
By this we mean sums of products of the random variables, where each product is obtained by 
multiplying one random variable from each ensemble. 

Definition 3.4 A multi-index a is a sequence (ay,... ,a n ) in N n ; the degree of a, denoted \cr\, is 
|{z E [n] : (7i > 0} |. Given a doubly-indexed set of indeterminates we write x a for the 

monomial IliLi ■ We now define a multilinear polynomial over such a set of indeterminates 
to be any expression 

Q(x) = Cq-Xq. (5) 

CT 

where the c a ’s are real constants, all but finitely many of which are zero. The degree of Q(x ) is 
max{|er| : c CT 7 ^ 0}, at most n. We also use the notation 

Q- d {x) = ^ c <tX<t 

\cr\<d 


and the analogous Q d {x) and Q >d (x). 

Naturally, we will consider applying multilinear polynomials Q to sequences of ensembles X\ 
the distribution of these random variables Q(X) is the subject of our invariance principle. Since 
Q(X) can be thought of as a function on a product space x • • • x Q n as described at the beginning 
of this section, there is a consistent way to define the notions of influences, T p , and noise stability 
from Section m 1 For example, the “influence of the ?’th ensemble on Q” is 

Inf i(Q(X)) = E[Var[Q(Af) | X\,..., X^, X i+1 ,..., X n ]]. 

Using independence and orthonormality, it is easy to show the following formulas, familiar from 
harmonic analysis of boolean functions: 

Proposition 3.5 Let X be a sequence of ensembles and Q a multilinear polynomial as in ©• 
Then 

E [Q(X)\ = c 0 ; E[Q(A') 2 ] = ^ c^; Var[Q(Af)] = £ c^; 

& | cr| >0 

Inf i(Q(X))= Y, 4 ; T p Q(X) = Y,P H *rX tr -, S P (Q(AT)) = ^ p^cl 

cr:ai>0 cr cr 

Note that in each case above, the formula does not depend on the sequence of ensembles X\ it 
only depends on Q. Thus we are justified in henceforth writing E[Q], E[Q 2 ], Var[Q], Inf i(Q), and 
§p(Q), and in treating T p as a formal operator on multilinear polynomials: 
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Definition 3.6 For p E [0,1] we define the operator T p as acting formally on multilinear polyno¬ 
mials Q(x) as in |^J) by 

(T v Q)(x) = Y'pWc'rXc 

cr 

Note that for every sequence of ensembles, we have that Definition 13.61 agrees with Definition 12.41 

We end this section with a short discussion of “low-degree influences”, a notion that has proven 
crucial in the analysis of PCPs (see, e.g., ESI)- 

Definition 3.7 The d-low-degree influence of the zth ensemble on Q(X) is 

lnif d {Q{X)) = Inf^(Q) = ]T c 2 . 

ar:\cr\<d,(Ti>0 


Note that this gives a way to define low-degree influences Infy rf (/) for functions / : f2 1 x • • • il n —> M 
on finite product spaces. 

There isn’t an especially natural interpretation of Inf f d (f). However, the notion is important for 
PCPs due to the fact that a function with variance 1 cannot have too many coordinates with 
substantial low-degree influence; this is reflected in the following easy proposition: 

Proposition 3.8 Suppose Q is multilinear polynomial as in B- Then 

^lnfP(Q) <d- Var[Q]. 


3.2 Hypercontractivity 

As mentioned in Section 12.11 our invariance principle requires that the ensembles involved to be 
hypercontractive in a certain sense. Recall that a random variable Y is said to be “(p, q , rj)- 
hypercontractive” for 1 < p < q < oo and 0 < q < 1 if 

\\a + r)Y\\ q < \\a + Y\\ p (6) 

for all a E M. This type of hypercontractivity was introduced (with slightly different notation) 
in [JJSj. Some basic facts about hypercontractivity are explained in Appendix^ much more can 
be found in @nj. Here we just note that for q > 2 a random variable Y is (2, q, r/)-hypercontractive 
with some r\ E (0,1) if and only if E[Y"] =0 and EfjPj 9 ] < oo. Also, if Y is (2, q, r/)-hypercontractive 
then r) < (q — l) -1 / 2 . 

We now define our extension of the notion of hypercontractivity to sequences of ensembles: 

Definition 3.9 Let X be a sequence of ensembles. For 1 < p < q < oo and 0 < 7] < 1 we say that 
X is (p. q, ry)-hypercontractive if 


||(r t? Q)(Ar)||,<||Q(Ar)|| p 


for every multilinear polynomial Q over X. 
Since T n is a contractive semi-group, we have 
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Remark 3.10 If X is {p,q,rf)-hypercontractive then it is (p, q,p')-hypercontractive for any 0 < 
rf < r]. 

There is a related notion of hypercontractivity for sets of random variables which considers all 
polynomials in the variables, not just multilinear polynomials; see, e.g., Janson EH! ■ Several of the 
properties of this notion of hypercontractivity carry over to our setting of sequences of ensembles. 
In particular, the following facts can easily be proved by repeating the analogous proofs in |29| : for 
completeness, we give the proofs in Appendix 1X1 

Proposition 3.11 Suppose X is a sequence of m ensembles and y is an independent sequence of 
ri 2 ensembles. Assume both are ( p, q,p)-hypercontractive. Then the sequence of ensembles XU y = 
(Xi, ..., X ni , yi,.. ., y n2 ) is also ( p,q,p)-hypercontractive. 

Proposition 3.12 Let X be a ( 2, q,r])-hypercontractive sequence of ensembles and Q a multilinear 
polynomial over X of degree d. Then 

tlQWII, < r d \\Q(x)h. 

In light of Proposition rrm to check that a sequence of ensembles is (p, q, 7?)-hypercontractive 
it is enough to check that each ensemble individually is (p, q, r/)-hypercontractive (as a “sequence” 
of length 1); in turn, it is easy to see that this is equivalent to checking that for each i , all linear 
combinations of the random variables X,i ,..., are hypercontractive in the traditional sense 

of Q. 

We end this section by recording the optimal hypercontractivity constants for the ensembles we 
consider. The result for ±1 Rademacher variables is well known and due originally to Bonami 7| 
and independently Beckner j^j; the same result for Gaussian and uniform random variables is also 
well known and in fact follows easily from the Rademacher case. The optimal hypercontractivity 
constants for general finite spaces was recently determined by Wolff m (see also EH): 

Theorem 3.13 Let X denote either a uniformly random ±1 bit, a standard one-dimensional Gaus¬ 
sian, or a random variable uniform on [— \/3, x/3] - Then X is (2, q, (q — l)” 1 ^ 2 ) -hypercontractive. 

Theorem 3.14 (Wolff) Let X be any mean-zero random variable on a finite probability space 
in which the minimum nonzero probability of any atom is a < 1/2. Then X is (2, q, p q (a))- 
hypercontractive, where 

( A 1 /* - A-WV 1/2 

y A 1 /? - A" 1 /? J 

, . l — o , , , 

with A = -, 1/q+l/q =1. 

a 

Note the following special case: 

Proposition 3.15 

773(0) = (a 1 / 3 + A -1 / 3 )” 17 ' o 1 / 6 , 

and also 

W /6 < *73(a) < 2“ 1/2 , 

for all a € [0,1/2]. 
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For general random variables with bounded moments we have the following results, proved in 
Appendix El 

Proposition 3.16 Let X be a mean-zero random variable satisfying E[|X| 9 ] < oo. Then X is 

IIXII 

(2 , q,p q )-hypercontractive with p q = o % / ( j_ 1 ||x|| • 

In particular, when E[X] = 0, E[X 2 ] = 1, and E[|X| 3 ] < f3, we have that X is (2, 3, 2~ 3 / 2 / d~ 1// ' 3 )- 
hypercontractive. 

Proposition 3.17 Let X be a mean-zero random variable satisfying E[|X| 9 ] < oo and let V be 
a random variable independent of X with P[V = 0] = 1 — p and P[V = 1] = p. Then V X is 
(2, q , £ q )-hypercontractive with £ q = x|| ' P 2 q • 

3.3 Hypotheses for invariance theorems — some families of ensembles 

All of the variants of our invariance principle that we prove in this section will have similar hypothe¬ 
ses. Specifically, they will be concerned with a multilinear polynomial Q over two hypercontractive 
sequences of ensembles, X and furthermore, X and y will be assumed to have satisfy a “match¬ 
ing moments” condition, as described below. We will now lay out four hypotheses — HI, H2, H 3, 
and H 4 that will be used in the theorems of this section. As can easily be seen (using Theorems l3.13l 
and rrm and Proposition 13.1 5l see also Appendix^, the hypothesis H 1 generalizes H2,H3, and 
H 4; hence all proofs will be carried out only in the setting of HI. However the amount of notation 
and number of parameters under HI is quite cumbersome, and the reader who is interested mainly 
in functions on finite product spaces (H 3) or just boolean functions where { — 1, l} n has the uniform 
distribution (H 4) may find it easier to proceed through the proofs and results in the restricted cases. 


Herewith our hypotheses: 


HI Let r > 3 be an integer and let X and y be independent sequences of n ensembles which 
are (2, r, r/)-hypercontractive; recall that p < (r — l) -1 / 2 . Assume furthermore that for all 
1 < i < n and all sets £ C N with |£| < r, the sequences X and y satisfy the “matching 
moments” condition 


E 


n 

.o-es 


= E 


II u. 


(7) 


Finally, let Q be a multilinear polynomial as in ©• 


We remark that in HI, if r = 3 then the matching moment conditions hold automatically 
since the sequences are orthonormal. We also remark that we have added the condition p < 
(r— l) -1 / 2 so that we can take y = Q , the Gaussian sequence of ensembles (see Theorem l3.13l) . 

H2 Let r = 3. Let X and y be independent sequences of ensembles in which each ensemble has 
only two random variables, X 1 ^q = 1 and Xj \ = Xj (respectively, Y^o = 1, Yj i = Yf), as in 
Remark f L2l Further assume that each X{ (respectively Yf) satisfies E[Wj] = 0, E[W 2 ] = 1 
and E[|X t | 3 ] < (3. Put p = 2- 3 / 2 /T 1 / 3 , so X and y are (2,3, r/)-hypercontractive. Finally, 
let Q be a multilinear polynomial as in ©■ 


The hypothesis H2 is used to derive the multilinear version of the Berry-Esseen inequality 
given in Theorem 12. II 
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H 3 Let r = 3 and let X be a sequence of n ensembles in which the random variables in each 
ensemble X L form a basis for the real-valued functions on some finite probability space f 
Further assume that the least nonzero probability of any atom in any Qj is a < 1/2, and 
let rj = ^a 1//6 - Let y be any independent (2,3, r/)-hypercontractive sequence of ensembles. 
Finally, let Q be a multilinear polynomial as in ©• 


We remark that Q(X) in H 3 encompasses all real-valued functions / on finite product spaces, 
including the familiar cases of the p-biased discrete cube (for which a = minjp, 1 — p}) and 
the set [q\ n with uniform measure (for which a = 1 /q). Note also that rj < 2 -1 / 2 so we may 
take y to be the Gaussian sequence of ensembles. 

H 4 Let r = 4 and r/ = 3 -1 / 2 . Let X and y be independent sequences of ensembles in which each 
ensemble has only two random variables, X 1 q = 1 and Xi t i = X % (respectively, Y/o = 1, 
Y/i = Yj), as in Lem ark 13. 21 Further assume that each X t (respectively Yj) is either a) a uni¬ 
formly random ±1 bit; b) a standard one-dimensional Gaussian; or c) uniform on [—3 1 / 2 , 3 1 / 2 ]. 
Hence X and y are (2,4, ? 7 )-hypercontractive. Finally, let Q be a multilinear polynomial as 
in ®. 


Note that this simplest of all hypotheses allows for arbitrary real-valued functions on the 
uniform-measure discrete cube / : {—l,l} n —> R. Also, under H 4, Q is just a multilinear 
polynomial in the usual sense over the Xf s or Yi s; in particular, if / : { — 1, l} n —:► R then Q 
is the “Fourier expansion” of /. Finally, note that the matching moments condition 0 holds 
in H 4 since it requires E[A/] = E[Yj 3 ] for each t, and this is true since both equal 0. 


3.4 Basic invariance principle, C r functional version 

The essence of our invariance principle is that if Q is of bounded degree and has low influences then 
the random variables Q(X) and Q(y) are close in distribution. The simplest way to formulate 
this conclusion is to say that if T : R —> R is a sufficiently nice “test function” then ^/(Q(X)) and 
^(Q(y)) are close in expectation. 


Theorem 3.18 Assume hypothesis HI, H2, H3, or H 4. Further assume Var[Q] < 1, deg(Q) < 
d, and Infj(Q) < r for all i. Let T : R —> R be a C r function with < B uniformly. Then 


where 


E[*(Q(AT))] -E[tf(Q(;y))] 


< e, 


e = 


—rd ^r/2—1 


( 2 B/r\) dt] 

B 30 d (3 d t 1 / 2 
B (10a^ 1//2 ) rf r 1 / 2 
B10 d r 


under HI, 
under H 2 , 
under H3, 
under H4. 


As will be the case in all of our theorems, the results under H2,H3 and H 4 are immediate 
corollaries of the result under HI; one only needs to substitute in r = 3, = 2 - 3 / 2 /G 1 / 3 or r = 3, 

r] = or r = 4, rj = (we have also here used that (l/3)42 9rf / 2 is at most 30 d and that 

(1/3) d 8 d and (1/12) d 9 d are at most 10 d ). Thus it will suffice for us to carry out the proof under 

HI. 


15 





Proof: We begin by defining intermediate sequences between X and (V- For i = 0,1,... ,n, let 
Z^ 1 ' denote the sequence of n ensembles (3^i, •.., 3^, ^i+i, ■ • •, X n ) and let Q W = Q(Z^). Our 
goal will be to show 


E[^(Q( i - 1 ))] — E['F(Q ( 


2 B 


< ( ~r r) rd ) ’ Infi(Q) 


r\ 


y/2 


( 8 ) 


for each i E [n]. Summing this over i will complete the proof since Z^ = X, Z^ = 3P and 

n n n 

J]lnf i(Q ) r/2 < W 2 " 1 • ^lnfi(Q) = W 2 " 1 • ]T Inff d (Q) < dr"/ 2 ” 1 , 

1=1 i=1 i=1 

where we used Proposition 13.81 and Var[Q] < 1. 


Let us hx a particular i E [n] and proceed to prove Given a multi-index cr, write a \ i for 


the same multi-index except with cr* = 0. 

Now write 

Q = 

E 

cr:ai=0 

R = 

E 

<r:er;>0 

s = 

^2 Co-Yi^i ■ Z^i- 

cr:<Ti>0 


~ . (i) . 

Note that Q and the variables Zy are independent of the variables in X{ and 3P and that 
Qh-P = Q + R and Q« = Q + S. 


To bound the left side of 
all x, y E R, 


i.e., |E[vl/(Q + R) — T(Q + S)]| — we use Taylor’s theorem: for 


In particular, 


and similarly, 


k=0 


k\ 


B 

t\ 


r —1 


E[T(Q + i?)]-^E 


k=0 


r— 1 




k -i 


fc! 


E[*(Q + S)]-^E 


S k 


k=0 


Jfe! 


i»r 


s E 0«n 

(9) 

f E [isr]- 

(10) 


We will see below that that R and S have finite r moments. Moreover, for 0 < k < r it holds that 
| ^ k \Q)R k | < | k\B Q r k R k | (and similarly for S ). Thus all moments above are finite. We now 
claim that for all 0 < k < r it holds that 


E[^ k \Q) R k ] = E[^ k \Q) S k ]. 


( 11 ) 
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Indeed 


E[^W(Q)fi fe ] = E 


*‘‘>(9) e n n y , n ^ 

(cr 1 ,...,^) t =1 t= 1 *=1 

s.t. Vf, tr*>0 


(0 

cr t \i 


e n*"*[* i ‘ ) («>n* ( X 

(o- 1 ,...,^) 4=1 4=1 

s.t. Vt, cr*>0 


e nc.-E^^n. 


r(®) 

' cr^i 


(er 1 ,...,er fc ) 4=1 
s.t. Vt, cd>0 


t=l 


E [n x 

7=1 


*. CT i 


E 


[IK. 


7=1 


= E 


^ fc) (Q)S A 


( 12 ) 


(13) 


The equality in m follows since and Q are independent of the variables in X, and y t . The 

equality in m follows from the matching moments condition 0 . 


From 0, m and CD it follows that 

|E [*(Q + R)~ tf(Q + S)]\ < ^ (E[|i2|H + E[\S\ r }). (14) 

We now use hypercontractivity. By Proposition 13.1 ll each Z^ is (2, r, r?)-hypercontractive. Thus 
by Proposition 13.121 

E[| R\ r ] < r 1 - rd E[R 2 } r / 2 , E[|S| r ] < V ~ rd E[S 2 ] r/2 . (15) 

However, 

E[5 2 ] = E[i? 2 ] = E; 4 = Infi(Q). (16) 

<T:<Ti>0 

Combining Cl. CD and CD it follows that 

|E[*(Q + R)~ T(Q + S)}\< ' Inf i(Q) r/2 

confirming 0 and completing the proof. □ 


3.5 Invariance principle — other functionals, and smoothed version 

Our basic invariance principle shows that E[T(<5(A?))] and E[T((5(3^))] are close if T is a C func¬ 
tional with bounded rth derivative. To show that the distributions of Q(X) and Q(y) are close 
in other senses we need the invariance principle for less smooth functionals. This we can obtain 
using straightforward approximation arguments; we defer the proof of Theorem 13.191 which follows 
to Section ESI 

Theorem eud shows closeness of distribution in two senses. The first is closeness in Levy’s 
metric, recall that the distance between two random variables R and S in Levy’s metric is 

d L {R, S) = inf {A > 0 : Vt G R, P[S < t - A] — A < P [R < t] < P[S < t + A] + A}. 
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We also show the distributions are close in the usual sense with a weaker bound; the proof of this 
goes by comparing the distributions of Q(X) and Q(y) to Q(G) and noting that bounded-degree 
Gaussian polynomials are known to have low “small ball probabilities”. Finally, Theorem Id. 1111 
also shows L 1 closeness and, as a technical necessity for applications, shows closeness under the 
functional ( : M —> M defined by 

{ x 2 if x < 0 , 

0 ifxe[0,1], (17) 

(x — l ) 2 if x > 1 ; 

this functional gives the squared distance to the interval [ 0 , 1 ]. 


Theorem 3.19 Assume Hypothesis HI, H2, H3, or HA. Further assmne Var[Q] < 1, deg(Q) < 
d and Inf,;(Q) < r for all i. Then 


< 

0 (e 1/r ), 

(18) 

< 

O^VP+i)), 

(19) 

< 

0 (e 2/ ”), 

( 20 ) 


where O(-) hides a constant depending only on r, and 


e = 


drj- rd T r / 2 ~ 1 under HI, 

30 d /3 d t 1 / 2 under H2, 

(10a -1 / 2 ) rf r 1 / 2 under H3, 

10 d r under HA. 


If in addition Var[Q] = 1 then 

P [Q(X) < t] - P[Q(y) < t] < 0(yd e 1 /( rd + 1 )). 


sup 

t 


( 21 ) 


As discussed in Section [2 .11 Theorem 13 .1 91 has the unavoidable deficiency of having error bounds 
depending on the degree d of Q. This can be overcome if we first “smooth” Q by applying Ti _ 7 
to it, for some 0 < 7 < 1. Theorem 13.201 which follows will be our main tool for applications; its 
proof is a straightforward degree truncation argument which we also defer to Section 13.61 As an 
additional benefit of this argument, we will show that Q need only have small low-degree influences, 
Infy d ((5), as opposed to small influences. As discussed at the end of Section Em this feature has 
proven essential for applications involving PCPs. 


Theorem 3.20 Assume hypothesis Hl,H3,or HA. Further assume Var[Q] < 1 and 

Inf^ 1 °s( 1 / T )/' K ’(Q) < r < for all i, where 


I< = 


log(l/ 77 ) under HI, 
log(l/a) under H 3, 
1 under HA. 


Given 0 < 7 < 1. write R = (Ti_ 7 Q)(Af) and S = (Ti_ 7 Q)(y). Then 


E[C(i*)] 


d L (R,S) 

E[C(S0] 


< r ^(7/*0 5 

< r n(7/«') j 
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where the Q(-) hides a constant depending only on r. 

More generally the statement of the theorem holds for R = Q(X),S = Q(y) if Var[Q >d ] < 
(1 — 'y) 2d for all d. 


3.6 Proofs of extensions of the invariance principle 

In this section we will prove Theorems 13.191 and 13.1201 under hypothesis HI. The results under 
H2,H3, and H 4 are corollaries. 


3.6.1 Invariance principle for some C° and C 1 functionals 

In this section we prove ®, ®, (EDI of Theorem 13. 191 We do it by approximating the following 
functions in the sup norm by smooth functions: 


h(x) = |x|; A S}t (x) = { *=£*■ 


if x < t — s, 

if x G [t — s, t + s], f(x) = 
if x > t + s: 


x 2 if x < 0 , 

0 if x £ [ 0 , 1 ], 

(x — l ) 2 if x > 1. 


Lemma 3.21 Let r > 2 be an integer. Then there exist constant B r for which the following holds. 
For all 0 < A < 1/2 there exist C°° functions l\, A\ t and satisfying the following: 

• ||^-*i||oo<2A; and, ||(^) (r )||oo < 4B r A 1 ^. 

• t agrees with A\ jt outside the interval (t — 2 A ,t + 2 A), and is otherwise in [0,1]; and, 
ll(^A,t) (r )||oo < B r X~ r . 

• IK A - Clloo < 2 A 2 ; and, || (C A ) W ||oo < 2 B r _ x \ 2 ~\ 

Proof: Let f{x) = xl{ x>0 y. We will show that for all A > 0 there is a C°° function f\ satisfying 
the following: 

• f\ and / agree on (—oo, —A] and [A,oo); 

• 0 < f\(x) < f(x) + A on (—A, A); and, 

• ||/i r) ||oo < 2-B r A 1_r . 

The construction of / easily gives the construction of the other functionals by letting £ x (x) = 
fx(x) + f\(-x) and 


A a (t) = J + A ) ii x > t, 

A,t ^ ' ' 1 — ^xfx(x — t + A) if x < f ; 


t X (r) = f /-J ifx ^ V2, 

1 if x < 0 . 


( 22 ) 

To construct /, first let if be a nonnegative C°° function satisfying the following: if is 0 outside 
(- 1 . 1 )./- I if{x) dx = 1, and xif(x) dx = 0. It is well known that such functions if exist. Define 
the constant B r to be ||'0^|| oo . 


Next, write ifx(x) = if(x/X)/ A, so ifx satisfies the same three properties as if with respect to 
the interval (—A, A) rather than (—1,1). Note that ll " / ’^ 1 


■'x iioo = B r A _1_r . 
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Finally, take f\ = f*i/j\, which is C°°. The first two properties demanded of / follow easily. To 
see the third, first note that /j/ ^ is identically 0 outside (—A, A) and then observe that for |x| < A, 

\f ( x\x)\ = \(f * {x)\ = |(/*v4 r) )(x)| < Halloo • [ I/I < 2 S r A 1_r . 

J x—X 

This completes the proof. □ 


We now prove csd , m and m - 
Proof: Note that the properties of A ^ t imply that 

P [R <t- 2A] < E[A^(i*)] < P [R <t + 2A] 
holds for every random variable R and every t and 0 < A < 1/2. 

Let us first prove with 


e = drj~ rd t r ' 2 ~ x 

since we assume H 1. Taking ^> = l\ in Theorem IT 1 SI we obtain 


(23) 


E \^(Q(X))\ - E[h(Q(y))\ < E [e&Q(X))\ - E[$(Q(y))\ 


+4A 


< (4 B r A 1_r /r!) dr)~ rd W 2 " 1 + 4A = 0(e A 1_r ) + 4A. 


Taking A = e 1 /”, gives the bound m Next, using ®) and applying Theorem 13.181 with 'k = A* t 
we obtain 


d L (Q(X),QQ>)) < max 4A,sup E[A { t (Q(X))] -E[A^(Q(^))] 


< max 


{(5 r X~ r /r\)dr]- rd r r / 2 _ 1 i 4 a| = max{0(e X~ r ), 4A}. 


Again taking A = eVh+B we achieve ( 111 ) 1 ) . Finally, using 'k = we get 


E[C(QW)]-E[c(Q(y))] < e[( a (q(h))]-e[( a (W))] 


+ 4A 2 


< (2 B r _ x X 2 ~ r /r\) dr)~ rd T r ' 2 ~ x + 4A 2 = 0(e X 2 ~ r ) + 4A 2 , 


and taking A = e 1//r we get J20. This concludes the proof of the first three bounds in Theorem 13.1 1)1 
□ 


3.6.2 Closeness in distribution 

We proceed to prove ED from Theorem 13. 191 By losing constant factors it will suffice to prove the 
bound in the case that y = G, the sequence of independent Gaussian ensembles. As mentioned, 
we will use the fact that bounded-degree multilinear polynomials over Q have low “small ball 
probabilities”. Specifically, the following theorem is an immediate consequence of Theorem 8 in m 
(taking q = 2d in their notation): 
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Theorem 3.22 There exists a universal constant C such that for all multilinear polynomials Q of 
degree d over Q and all e > 0 , 


P[\Q( 6 )\<e]<Cd(e/\\Q(g)\\ 2 ) 1 / d . 

Thus we have the following: 

Corollary 3.23 For all multilinear polynomials Q of degree d over Q with Var[Q] = 1 and for all 
t E M and e > 0, 

P[\Q(G)-t\<e] <0(de 1 / d ). 

We now prove (Hill- 


Proof: We will use Theorem 13.181 with = A^ t . where A will be chosen later. Writing A t = A^ t 
for brevity and using fact dm twice, we have 


P [Q(X) < t] < 
< 
< 


E[A t+ 2 X (X)} 

E[A t + 2 x{ 6 )] + \E[A t+ 2 X (X)} - E[A t+ 2 X (G)]\ 

P IQ( 6 ) < t + 4A] + \E[A t+ 2 X (X)} - E[A t+ 2 A (0)]| 

P [Q( 6 ) < t] + P[t < Q(G) <t + 4A] + |E[Aj + 2 a(A:)] - E[A t+ 2 X (G)}\. (24) 


The second quantity in (PU is at most 0(d{A\) l / d ) by Corollary 13.231 the third quantity in dm is 
at most 0(eA _r ) by Lemma 13.21 l and Theorem 13.1 81 Thus we conclude 


P [Q(X)<t] < P[Q(G) <t] + 0(d\ l / d ) + 0(e A _r ), 


independently of t. Similarly it follows that 

P[Q(X) <t}> P [Q(Q) < t] - 0(dX 1 ^ d ) - 0(e X~ r ). 
independently of t. Choosing A = e d /( rd + 1 ) we get 


p [Q(x)<t] 


P [Q{G)<t] < 0(de l ^ rd+ ^), 


as required. □ 


The proof of Theorem 13.191 is now complete. 


3.6.3 Invariance principle for smoothed functions 

The proof of Theorem 13.201 is by truncating at degree d = clog(l/r)/log(l/r/), where c > 0 is a 
sufficiently small constant to be chosen later. Let L(R) = (Ti- 1 Q)- d (X), H(R ) = (T\_ 1 Q) >d {X ), 
and define L(S), and H(S) analogously for y. Note that the low-degree influences of Ti- 7 Q are 
no more than those of Q. 

We first prove the upper bound on d^R, S). By Theorem 13.101 we have 

d L (L(R), L(S )) < d 0(1) ?T 0(d) t 0(1) = r]~ e{d) r 0(1) . (25) 

As for H{R) and H(S) we have E[H(R)] = E[H{S)} = 0 and E[iL(.R) 2 ] = E[i/(S) 2 ] < (1 - -f) 2d 
(since Var[Q] < 1). Thus by Chebyshev’s inequality it follows that for all A, 

P[\H(R)\ > A] < (l- 7 ) 2 d /A 2 , P[\H(S)\ > A] < (1 - 7 ) 2 d /A 2 . (26) 
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Combining (1251) and (HU) and taking A = (1 — 7 ) 2d / 3 we conclude that the Levy distance between 
R and S is at most 


r e(i) + 4(1 _ 7 )2d/3 < v -ew T ©(i ) +exp (_ 7 0 ( d ))_ (27) 

Our choice of d, with c taken sufficiently small so that the second term above dominates, completes 
the proof of the upper bound on di(R, S ). 


To prove the claim about ( we need the following simple lemma: 

Lemma 3.24 For all a, b e M, \((a + b) — C( a )| A 2\ab\ + 2b 2 . 

Proof: We have 

|C(a + b) - C(a)| < | 6 | sup |C'0)|- 

aiG[a,a+&] 

The claim follows since C'( x ) = 0 for \ x \ A 1 and = 2||x| — 1 | < 2\x\ for \x\ > 1 . □ 

By (12H1) in Theorem 13.11)1 we get the upper bound of r / _e ^ r e (L for |E[C(L(I?)) — £(L(.S , ))]|. 
The Lemma above and Cauchy-Schwartz imply 


E 


|CCR))-C(L(i*))|] =E[|C(L(i*) + iL(i*))-C(L(#))|] <2E[\L(R)H(R)\] + E[H(R) 2 } 

< 2^E[H(R) 2 } + E[H(R) 2 } < 2(1 - 7 ) d + (1 - 7 ) 2d < exp (- 7 0(d)), 


and similarly for S. Thus 

|E[C (R)} - E[C(5)]| < v~ e{d) ^ 0(1) + exp (- 7 0(d)) 


as in (f27|) and we get the same upper bound. 

Finally, it is easy to see that the second statement of the theorem also holds as the only property 
of R we have used is that Var [Q >d ] < (1 — 7 ) 2rf for all d. 


3.7 Invariance principle under Lyapunov conditions 

Here we sketch a proof of Theorem 12.21 

Proof: (sketch) Let A : M —> [0, 1] be a nondecreasing smooth function with A(0) = 0, A(l) = 1 
and A := sup^gjg |A ,,, (a;)| < oo. Then sup^gg |A"(x)| < A/2 and therefore for x,y G R we have 

|A"(®) - A"(y)| < A 3 ~ q | A"(x) - A "^)! 9-2 < A 3 ~ q (A\x - y\) q ~ 2 = A\x - y\ q ~ 2 . 

For s > 0 let A s (x) = A (x/s), so that |A"(x) — A"(y)| < As~ q \x — y\ q ~ 2 for all ijel. Let Y 
and Z be random variables with E[K] = E [Z\, E[(E 2 ] = E[Z 2 ] and E[|E| 9 ], E[|Z| 9 ] < oo. Then 
|E[A s (x + Y )] - E[A s (x + Z)}\ < As- q (E[\Y\ q ] + E[\Z\ q ]) for all iGl. Indeed, for u G [0,1] let 
ip(u) = E[A s (x + uY)\ — E[A s (x + uZ)\. Then <^(0) = y/(0) = 0 and 

W"{u)\ = |E[E 2 (A"(x+ur)-A"(x))]-E[Z 2 (A"(x+uZ)-A"(x))]| < A S - q u q - 2 (E[\Y\ q }+E[\Z\ q ]), 

so that |<£>(1)| < Hs _<? (E[|’E| ,? ] + E[|Z| 9 ]). Now, using the above estimate and the fact that both X 
and Q are ( 2 , q, 7 )-hypercontractive with y = Yq-i one arrives at 

|E[A s (Q(X 1 , ..., X n ))] - E[A S (Q(G U ..., G n ))]\ < 0(s" ECC 4) ?/2 )- 


22 









Replacing Q by Q + t and using the arguments of subsection IM.li.2l yields 


sup|P[Q(X 1 ,...,X n ) <t]-P[Q(G u ...,G n ) <t]| < 

i SBi 

Optimizing over s ends the proof. We skip some elementary calculations. □ 


4 Proofs of the conjectures 

Our applications of the invariance principle have the following character: We wish to study certain 
noise stability properties of low-influence functions on finite product probability spaces. By using 
the invariance principle for slightly smoothed functions, Theorem IT201 we can essentially analyze 
the properties in the product space of our choosing. And as it happens, the necessary result for 
Majority Is Stablest is already known in Gaussian space [2] and the necessary result for It Ain’t 
Over Till It’s Over is already known on the uniform-measure discrete cube ESI. 

In the case of the Majority Is Stablest problem, one needs to find a set of prescribed Gaus¬ 
sian measure which maximizes the probability that the Ornstein-Uhlenbeck process (started at 
the Gaussian measure) will belong to the set at times 0 and time t for some fixed time t. This 
problem was solved by Borell in 5] using symmetrization arguments. It should also be noted that 
the analogous result for the sphere has been proven in more than one place, including a paper of 
Feige and Schechtman m- It fact, one can deduce Borell’s result and Majority is Stablest from 
the spherical result using the proximity of spherical and Gaussian measures in high dimensions and 
the invariance principle proven here. 


In the case of the It Ain’t Over Till It’s Over problem, the necessary result on the discrete 
cube {—1, l} n was essentially proven in the recent paper ESI using the reverse Bonami-Beckner 
inequality (which is also due to Borell jBj). This paper did not solve the conjecture though (nor 
did that paper note the relevance), even when the conjecture is set on {—1, l} n ; the reason is that 
reduction of the problem to a question about T p already involves transferring to a different product 
domain (e.g., {—1,0, l} n with biased measure) and so the invariance principle is required. 


Note that in both cases the necessary auxiliary result is valid without any assumptions about low 
influences. This should not be surprising in the Gaussian case, since given a multilinear polynomial 
Q over Gaussians it is easy to define another multilinear polynomial Q over Gaussians with exactly 
the same distribution and arbitrarily low influences, by letting 


0 ( 2 - 1 ,!, • • • , X\ t N j ••• j 2-71,1, • • • , X n ,]y) — Q 


2-1,1 H-b 211, N 


2-77.1 T ' ' ‘ T Xn,N 


AT/2 ’ "■ ’ ATl/2 

The fact that low influences are not required for the the results of ESI is perhaps more surprising. 


4.1 Noise stability in Gaussian space 

We begin by recalling some definitions and results relevant for “Gaussian noise stability”. Through¬ 
out this section we consider M n to have the standard n-dimensional Gaussian distribution, and our 
probabilities and expectations are over this distribution. 
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Let Up denote the Ornstein-Uhlenbeck operator acting on L 2 (R n ) by 


{U p f){x) = E [f(px + y/l - p 2 y )], 
y 

where y is a random standard n-dimensional Gaussian. It is easy to see that if f(x) is expressible 
as a multilinear polynomial in its n independent Gaussian inputs, 

f(xi,...,x n ) = Y csU*u 

SC[n] i£S 

then U p f is the following multilinear polynomial: 

(U p f)(x 1 ,...,x n )= Y 

SC[n] i£S 

Thus Up acts identically to T p for multilinear polynomials Q over Q, the Gaussian sequence of 
ensembles. 

Next, given any function / : R n — ► R, recall that its (Gaussian) nonincreasing spherical rear¬ 
rangement is defined to be the upper semicontinuous nondecreasing function f* : R —► R which is 
equimeasurable with /; i.e., for all t e R, f* satisfies P[/ > t] = P[/* > t] under Gaussian measure. 

We now state a result of Borell concerning the Ornstein-Uhlenbeck operator U p (see also 
Ledoux’s Saint-Flour lecture notes d). Borell uses Ehrhai'd symmetrization to show the fol¬ 
lowing: 

Theorem 4.1 (Borell J2j/) Let /,g £ L 2 (R n ). Then for all 0 < p < 1 and all q > 1, 

n(u P f) q -g]<n(u p fy-g*]. 

BorelPs result is more general and is stated for Lipschitz functions, but standard density argu¬ 
ments immediately imply the validity of the statement above. One immediate consequence of the 
theorem is that S p (f) < § p (/*), where we define 

$p(f) = E[f.Upf]=n(U V pf) 2 }. (28) 

One can think of this quantity as the “(Gaussian) noise stability of / at p ”; again, it is compatible 
with our earlier definition of S p if / is a multilinear polynomial over Q. 

Note that the latter equality in and the fact that U^p is positivity-preserving and linear 
imply that defines an L 2 norm on L 2 { R n ), dominated by the usual L 2 norm, so that it is a 
continuous convex functional on L 2 (R ra ). The set of all [0, l]-valued functions from L 2 (R ra ) having 
the same mean as / is closed and bounded in the standard L 2 norm and one can easily check that 
its extremal points are indicator functions; hence by the Edgar-Choquet theorem (see D39; clearly 
L 2 (R n ) is separable and it has the Radon-Nikodym property since it is a Hilbert space): 

V^p(/) ^ SU P v^o(x), 
x 

where the supremum is taken over all functions % : R n —> {0, 1} with E[y] = E[/]. Since by Borell’s 
result § p (x) < Sp(x*), we have S p (f) < § p (x p ) where : R —► { 0 , 1 } is the indicator function of 
a halfline with measure p = E[/]. 

Let us introduce some notation: 
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Definition 4.2 Given p E [0,1], define : M {0,1} to be the indicator function of the inter¬ 
val (— oo,t\, where t is chosen so that E[x p ] = p. Explicitly, t = ^~ 1 (p), where denotes the 
distribution function of a standard Gaussian. Furthermore, define 


r p ( M ) = § p ( Xp ) = p [x<t,Y<t\, 


where (X,Y) is a two dimensional Gaussian vector with covariance matrix 



Summarizing the above discussion, we obtain: 


Corollary 4.3 Let f : M n —> [0,1] be a measurable function on Gaussian space with E[/] = p. 
Then for all 0 < p < 1 we have § p (/) < T p (p). 

This is the result we will use to prove the Majority Is Stablest conjecture. We note that in general 
there is no closed form for T p (p); however, some asymptotics are known: For balanced functions 
we have Sheppard’s formula r p (l/2) = \ + arcsin p. Some other properties of T p (p) are given in 
Appendix [BJ 


4.2 Majority Is Stablest 

In this section we prove a strengthened form of the Majority Is Stablest conjecture. The implications 
of this result were discussed in Section EP 1 


Theorem 4.4 Let f : x ■ ■ ■ x Ll n — > [0,1] be a function on a finite product probability space 

and assume that for each i the minimum probability of any atom in flj is at least a < 1/2. Write 
K = log(l/a). Further assume that there is a 0 < r < 1/2 such that Inf^ 10 ^ 1 ^^(/) < r for 
all i. (See Definition \ 3. 1\ for the definition of low-degree influence.) Let p = E[/]. Then for any 
0 < p < l, 


§p(/) < r p (/r) + e, 


where 

e = 0 (—\ 

\l-pJ log(l/r) 

For the reader’s convenience we record here two facts from Appendix iBl 


r p(^) = 4 + 2// arcsin P 

r P {p) ~ M 2/(1+p) ( 47 rln(l/^))~ p/(1+p) as/i—>0. 

Proof: As discussed in Section rm let X be the sequence of ensembles such that Xi spans the 
functions on f2j, and express / as the multilinear polynomial Q. We use the invariance principle 
under hypothesis H 2. Express p = p' ■ (1 — x) 2 , where 0 < 7 <C 1 — p will be chosen later. Writing 
Q(x) = Y) c (t x <t (with cq = p) we see that 


W W) = E ^ • (! - 7) 2 ) |ct| 4 = -yQ)(Q)), 


where Q is the sequence of independent Gaussian ensembles. 


Since Q(X) is bounded in [0,1] the same is true of R = (T\- 1 Q){X). In other words, E[£(i2)] = 
0 , where ( is the function from HU). Writing S = (Tj_ 7 Q)(C/), we conclude from Theorem 13.201 
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that E[C(5)] < t^/ k \ That is, ||S — 5' / ||| < t^/ k \ where S' is the random variable depending 
on S defined by 


0 if 5 < 0 , 

S' =1 S if S <e [0,1], 
1 if S > 1. 


Then 


\S p '(S)-§ p/ {S')\ = |E[S-^S]-E[5 , -17 P ^ / ]| 

< |E[S • U P ’S] - E[S" • U P 'S}\ + |E[S' • UpS\ - E[S' • UpS'}\ 

< (||5|| 2 + ||5 / || 2 )||S-5 , ||2<r^/^, 

where we have used the fact that Up is a contraction on L 2 . 

Writing y' = E[S'] it follows from Cauchy-Schwartz that \p — p'\ < r ^Ft/ K ) m Since S' takes 
values in [0,1] it follows from Corollary 14.31 that § p /(S' / ) < T p (//). We thus conclude 

S p{Q{X)) = §p{S) < Sp(S') + < Vp(p') + r n ^/ K \ 

We can now bound the difference \F p (p) — Fp(p')\ using Lemmas IB. 31 and Corollary IB. 51 in Ap¬ 
pendix E We get a contribution of 2|p, — p'\ < r ^Ft/K) from the difference in the p’s and a 
contribution of at most 0( 7/(1 — p )) from the difference in the p’s. Thus we have 


Sp(Q(*)) < W + r+ 0( 7 /( 1 - p)). 


Taking 


7 = C-K- 


log log(l/r) 


log(l/r) 

for some large enough constant C and this gives the claimed bound. □ 


4.3 It Ain’t Over Till It’s Over 

As mentioned, our proof of the It Ain’t Over Till It’s Over conjecture will use a result due essentially 
to |43| : 

Theorem 4.5 Let f : {—l,l} n —► [0,1] have E[/] = p (with respect to uniform measure on 
{—1, l} n ). Then for any 0 < p < 1 and any 0 < e < 1 — p we have 


provided 


where 


P [T p f >l-5]<e 
$ < e p 2 /( 1 -p 2 )+c>(O i 


K = 


ycjp) 
1 -p 


c(p) = p log(e/(l - p)). 


y/log(\/e) 

This theorem follows from the proof of Theorem 4.1 in ESI; for completeness we give an explicit 
derivation in Appendix [UJ 


Remark 4.6 Since the only fact about {—1, F] n used in the proof of Theorem |^.5| is the reverse 
Bonami-Beckner inequality, and since this inequality also holds in Gaussian space, we conclude that 
Theorem also holds for measurable functions on Gaussian space f : M n —> [0,1] . In this setting 
the result can be proved using Borell’s Corollary \f.S[ instead of using the reverse Bonami-Beckner 
inequality. 
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The first step of the proof of It Ain’t Over Till It’s Over is to extend Theorem l4.5l to functions on 
arbitrary product probability spaces. Note that if we only want to solve the problem for functions 
on {—1, l} n with the uniform measure, this step is unnecessary. The proof of the extension is very 
similar to the proof of Theorem 14.41 In order to state the theorem it would be helpful to let u > 0 
be a constant such that Theorem 14.201 holds with the bound r ul ' K . 


Theorem 4.7 Let f : Hi x ■ ■ ■ x Q n —> [0,1] be a function on a finite product probability space 
and assume that for each i the minimum probability of any atom in H, is at least a < 1/2. Let 
K > log(l/a). Further assume that there is a t > 0 such that Inf l ~ los( ' 1 ^ T ' )//A (/) < r for all i 
(recall Definition Wft - Let p = E[/]. Then for any 0 < p < 1 there exists e(p,p) such that if 
0 < e < e(p, p) we have 

P[T p f>l-5}<e 


provided 

5 < e P 2 /(i-p 2 )+C«: ; T < gfiooKyu(l—p))(1/(1—p) 3 +Ck) 


where 

K = >/<&) . 1 

1 “ P \Z^g{l/e)' 
and C > 0 is some constant. 


c(p) = plog(e/(l — p)) + e 


Proof: Without loss of generality we assume that 5 = e P 2 /( 1 -p 2 )+ CK a s taking a smaller 5 yields 
a smaller tail probability. We can also assume e(p,p) < 1/10. Let X and Q be as in the proof of 
Theorem l4.4l and this time decompose p = p'■( 1 — 7 ) where we take 7 = k-(1 — p) 2 . Note that taking 
e(p,p) sufficiently small we have k < 1,7 < 0.1 and (1 — p)/( 1 — p') < 2. Let R = (Ti_ 7 Q)(Af) as 
before, and let S = (Ti_ 7 Q)(y), where ^ denotes the Rademacher sequence of ensembles (Y/o = 1, 
1^7 = ±1 independently and uniformly random). Since E[^(J2)] = 0 as before, we conclude from 
Theorem 14.201 that we have E^S 1 )] < t u1 ^ k < e 10 /( 1 -p)+ 2C ' K ; i.e., 

||S - S' / ||2 < e 10 /(l-p)+2C K ( 29 ) 


where S' is the truncated version of S as in the proof of Theorem 14.41 Now S' is a function 
{—1, l} n —► [0,1] with mean p' differing from p by at most e 5 (using Cauchy-Schwartz, as before). 
This implies that c(p ') < 0{c(p)). 

Furthermore, our assumed upper bound on d also holds with p' in place of p. This is because 


„/2 


1 ~ P‘ 


,/2 


1 - p 2 1 - p' 1 1 - p 


< (P 2 - P 2 ) 


< 


2 7 


< 


87 


(1-P /2 ) 2 - (1 - P') 2 - (1 ~P) 


= 8k. 


Thus Theorem 14.51 implies that if C is sufficiently large then 


P [T p i S' > 1 -45] < e/ 2 . 


This, in turn implies that 

P[T p ,S > 1-25} <3e/4. 

This follows by (|23jt since, 

P[T p rS > 1 — 45] — P [T p i S' >1-2 5} < 5~ 2 \\T p , S - T p , S' f 2 < d“ 2 ||5 - S'\\ 2 2 
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We now use Theorem rrm again, bounding the Levy distance of ( T p Q)(y ) and ( T p Q){X ) by 
T u ( 1 ~ p )/^, which is smaller than 5 and e/8. Thus 

P [(T p Q)(X) > 1-6] < P [T p f > 1 - 2<5] + e/8 < e, 

as needed. □ 

The second step of the proof of It Ain’t Over Till It’s Over is to use the invariance principle to 
show that the random variable V p f (recall Definition 12.61) has essentially the same distribution as 

T v~ P f- 

Theorem 4.8 Let 0 < p < 1 and let f : Di X ■ ■ ■ X Ll n —> [0,1] be a function on a finite product 
probability space; assume that for each i the minimum probability of any atom in flj is at least 
a < 1/2. Further assume that there is a 0 < r < 1/2 such that Inf^ io ^ 1 6 T )/ K < T j or a p ^ w Lere 
K' = log(l/(ap(l — p))). Then 

dUVpf^f) < r n ^-^ K '\ 

Proof: Introduce X and Q as in the proof of Theorems 14.41 and rm We now define a new 
independent sequence of orthonormal ensembles X ^ as follows. Let V\,... ,V n be independent 
random variables, each of which is 1 with probability p and 0 with probability 1 — p. Now define 
X^ = (x[ p \ .. ., Xn ■*) by x\ P q = 1 for each z, and X- P j = p~ 1 / 2 ViXij for each i and j > 0. It is 
easy to verify that X ^ is indeed an independent sequence of orthonormal ensembles. We will also 
use the fact that each atom in the ensemble X^f - has weight at least of = a-minjp, 1— p] > ap(l—p). 
(one can also use Proposition 14. 171 to get a bit better estimate on K '). 

The crucial observation is now simply that the random variable V p f has precisely the same 
distribution as the random variable (T^ p Q)(X^ p ' ) ). The reason is that when the randomness in 
the Vi = 1 ensembles is fixed, the expectation of the restricted function is given by substituting 0 
for all other random variables X p j. The T serves to cancel the factors p -1 / 2 introduced in the 

definition of x\ P j to ensure orthonormality. 

It now simply remains to use Theorem 14.201 to bound the Levy distance of ( T^ p Q)(X ( ' p ') and 
{T^-pQ)(X), where here X denotes a copy of this sequence of ensembles. We use hypothesis H 3 
and get a bound of r^ 1 ~v / D/ A ') = as required. □ 

Our generalization of It Ain’t Over Till It’s Over is now simply a corollary of Theorems 14.71 
and 14.81 by taking K 1 instead of K in the upper bound on t and taking 6 to have its maximum 
possible value, we make the error of 

T u((l-p)/K’) < e (100/(l-p))(l/(l-p) 3 +CV) 
from Theorem 14.81 which is negligible compared to both e and 5 below. 

Theorem 4.9 Let 0 < p < 1 and let f : Di x ■ ■ ■ x Ll n —► [0,1] be a function on a finite product 

probability space; assume that for each i the minimum probability of any atom in Dj is at least 

a < 1/2. Further assume that there is a 0 < r < 1/2 such that Inf,^ 1 o s4 1 /t)/^' T j or a jj ^ w Lere 

K l = log(l/(ap(l — p))). Let p = E[/]. Then there exists an e(p, p) > 0 such that if e < e(p, p) 

then 

P[V p f>l-6]<e 
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provided 

S < e P 2 /(l-P 2 )+C« j T < € (100K'/u(1-p))(1/(1- P ) 3 +Ck ) 


where 

K = n/cQx) _ 1 

1 _ P \A>g(l/e)’ 
where C > 0 is some finite constant. 


c(n) = (i log(e/(l - //)) + £, 


Remark 4.10 To get V p f bounded away from both 0 and 1 as desired in Conjecture EH simply 
use Theorem EH twice, once with f, once with 1 — / . 


5 Weight at low levels — a counterexample 


The simplest version of the Majority Is Stablest result states roughly that among all balanced func¬ 
tions / : {—1, l} n —i► {—1,1} with small influences, the Majority function maximizes p s f(S) 2 

for each p. One might conjecture that more is true; specifically, that Majority maximizes X^| 5 |<d /(S') 2 
for each d = 1, 2, 3,... . This is known to be the case for d = 1 (ESI) and is somewhat suggested 
by the theorem of Bourgain GH which says that ^|s|<d/(£') 2 < 1 — d 1//2 ° W for functions with 
low influences. An essentially weaker conjecture was made Kalai EH: 

Conjecture 5.1 Let d> 1 and let C n denote the collection of all functions f : {—1, l} n —> {—1,1} 
which are odd and transitive-symmetric (see Section \2.d.l\ ’s discussion of m )■ Then 


limsup sup V f{S) 2 


lim Y Majnl'S’) 2 - 

n odd —>00 ^ 

\S\<d 


We now show that these conjectures are false: We construct a sequence (/ n ) of completely 
symmetric odd functions with small influences that have more weight on levels 1, 2, and 3 than 
Majority has. By “completely symmetric” we mean that f n (x) depends only on xp because 
of this symmetry our counterexample is more naturally viewed in terms of the Hermite expansions 
of functions / : R —> {—1,1} on one-dimensional Gaussian space. 


There are several normalizations of the Hermite polynomials in the literature. We will follow |41 j 
and define them to be the orthonormal polynomials with respect to the one-dimensional Gaussian 
density function <p(x) = e~ x / 2 Specifically, we define the Hermite polynomials hfix) for 
d E N by 


exp (Ax - A 2 /2) = Y h d(x). 


d =o 


The first few such polynomials are ho(x) = 1, h\(x) = x, /i 2 (x) = (x 2 — l)/y/2, and hfix) = 
(x 3 — 3x) /y/Q. The orthonormality condition these polynomials satisfy is 


/ h d (x)h d i (x)<p(x) dx 

J K 


1 if d = d', 
0 else. 


We will actually henceforth consider functions whose domain is M* = R \ {0}, for simplicity; 
the value of a function at a single point makes no difference to its Hermite expansion. Given a 
function / : M* —> R we write f(d) for f h d (x)f(x)<p(x ) dx. Let us also use the notation Maj for 
the function which is 1 on (0,oo) and —1 on (—oo,0). 
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Theorem 5.2 There is an odd function f : M* —» {—1,1} with 


Y > -75913 > 

d< 3 


- + 7- = VMaj(d) 5 

7T 07T z ' 


d< 3 


Proof: Let t > 0 be a parameter to be chosen later, and let / be the function which is 1 on 
(—00, — 1\ and (0,t), and —1 on (—t, 0) and [f,oo). Since / is odd, /(0) = /(2) = 0. Elementary 
integration gives 

Fi(t) = j h\(x)ip(x) dx = — e -t2 / 2 /\/27r, F^(t) = j hs(x)<p(x) dx = (1 — t 2 )e~ f2 / 2 /V127T; 

thus 


/(l) = 2(E 1 (t) + F 1 (-t)-F 1 (0))-F 1 (oo)-F 1 (-oo) = v / 2A(l-2e-' 2 / 2 ), 

/(3) = 2(E 1 (t) + F 1 (-t)-F 1 (0))-F 1 (oo)-F 1 (-oo) = - x /l73^(l-2(l-t 2 )e-' 2 / 2 ). 

We conclude 

E /(‘O' = ^ ( x - 2e“ t2/2 ) 2 + i- (l - 2(1 - t 2 )e~ t2 / 2 y. (30) 

d< 3 

As t —► 0 or 00 we recover the fact, well known in the boolean regime (see, e.g., [B]), that 
Maj(d) 2 = 2/77 + 1/37T. But the above expression is not maximized for these t; rather, it is 
maximized at t = 2.69647, where the expression becomes roughly .75913. Fixing this particular t 
completes the proof. □ 

It is now clear how to construct the sequence of completely symmetric odd functions f n : 
{ —1, l} n —> {—1,1} with the same property — take f n (x) = f((x 1 + • • • + x n )/y/n). The proof 
that the property holds follows essentially from the fact that the limits of Kravchuk polynomials 
are Hermite polynomials. For completeness, give a direct proof of Corollary 15.31 in Appendix El 

Corollary 5.3 Forn odd there is a sequence of completely symmetric odd functions f n : {—1, l} n —*► 
{—1,1} satisfying Inf i(f n ) < 0{l/^Jn) for each i, and 

lim V fn(S ) 2 > 0.75913 > - + = lim V Maj n (S’) 2 . 

n odd —>oo,g|^g 7T o7T n odd ^°° | 5|<3 

In light of this counterexample, it seems we can only hope to sharpen Bourgain’s Theorem 12.71 
in the asymptotic setting; one might ask whether its upper bound can be improved to 

1 - (1 - o(1))(2/tt) 3 / 2 cT 1 / 2 , 


the asymptotics for Majority. 
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A Hypercontractivity of sequences of ensembles 

We now give the proofs of Propositions 13.1 H and l3~T2l that, were omitted. As mentioned, the proofs 
are completely straightforward adaptations of the analogous proofs in 129- 

Proof: (of Proposition 13.1 1 1I Let Q be a multilinear polynomial over X U y. Note that we can 
write Q{X U y) as ]>Y CjX (Tj y Vj , where the er’s are multi-indexes for X, the u’s are multi-indexes 
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for y, and the Cj s are constants. Then 


\\(T v Q)(xuy)\\ q = 


< 


< 


E ri Wj\+\ v i\ c .y V II 

j 

IIII^E 

j 

llll 'E(v Mc iX aj )y Vj \\LP(y)\\Li(x) 
j 

llll '^2( r l^ c jXcr j )yv j \\Li(X)\\LP(y) 
j 

llll T ^ ) (E^^)^) || ww || ^(y) 

j 

11 E C J ^ v i X °j\\LP{X)\\LP (y) 

j 

\\Q(xuy)\\ p , 


where the second inequality used a simple consequence of the integral version of Minkowski’s in¬ 
equality and p < q (see :2D, Prop. C.4]). □ 


Proof: (of Proposition EH Note that if Q = Q d then we obviously have equality. In the general 
case, write Q = Ya=o Q =l - an d note that E [Q =l (X)Q = i (X)\ = 0 for i ^ j is easy to check. Thus 


a a 

\\Q(X)\\ q = l|T„(E»T i <r i (*))ll, < W^v-'Q^X) 


i =0 


1/2 


J2v~ 2i \\Q = \X) 


i=0 


< i- d \\Q(x)h- 


a=0 


□ 


Let us also mention some standard facts about the (2, q, ?y)-hypercontractivity of random vari¬ 
ables. Let q > 2. Clearly, if we want X to be (2, q. r/)-hypercontractive, we must assume 
that E[|W| 9 ] < oo. If X is (2, q, r/)-hypercontractive for some q G (0,1) then E[W] = 0 and 
r/ < (q — l) -1 / 2 . Indeed, it suffices to consider the first and second order Taylor expansions in both 
sides of the inequality ||1 + r]bX\\ q < ||1 + bX ||2 as b —> 0. We leave details to the reader. 


We now give the proofs of Proposition Id. Kil and Proposition Id .171 which follow the argument of 
Szulga O Prop. 2.20]: 


Proof: (of Proposition Id. llll) Let X' be an independent copy of X and put Y = X — X'. By the 
triangle inequality ||E||q < 2||X|| ? . Let e be a symmetric ±1 Bernoulli random variable independent 
of Y. Note that Y is symmetric, so it has the same distribution as eY. Now by Jensen’s inequality, 
Fubini’s theorem, (2 ,q, (q — l) -1 / 2 )-hypercontractivity of e, and Minkowski’s inequality we get 

||o + V q X\\ q < ||a + r] q Y\\ q = ||a + 7 lq eY\\ q < (E Y [(E £ [|a + (q -\f'\ q eY | 2 ]) 9/2 ]) 1/9 = 

( E [(a 2 + (q - 1 )if q Y 2 ) q / 2 }) 1/q = ll« 2 + (? - 1)^ 2 ||$ < (« 2 + (« " l)^ll^ 2 |l g / 2 ) 1/2 = 
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(a 2 + (J-^f) 2 • EfX 2 ]) 1 / 2 < (a 2 + EiX 2 }) 1 / 2 = ||a + X\\ 2 . 

4||X ||g 

□ 

Proof: (of Proposition 13.1711 Let (X', V') be an independent copy of (X, V) and put Y = 
VX — V'X'. Then ||X|| g < 2||V’|| g ||X|| g = 2p l / q \\X\\ q and as in the previous proof we get 

||a+£,VX||, < ||a+e,K||, < (a 2 + (g-l)e 2 ||P|| 2 ) 1/2 < (a 2 + 4(g — l)^ 2 p 2 /' ? ||X|| 2 ) 1 / 2 = ||a+VX|| 2 . 
□ 


If X is defined on a finite probability space in which probability of all atoms is at least a then 
obviously E[X 2 ] > aUXH^, so that E[|X| 9 ] < ||X||^ 2 -E[X 2 ] < (E[X 2 ])' ?//2 o 1_ 2. In particular, if 
q = 3 then ||X||3/||X|| 2 < a -1 / 6 , so that VX is (2, 3, £3)-hypercontractive with £3 = 2“^a 1 / 6 /? 1 / 6 . 

Let us also point out that if E[X 4 ] < 00 and X is symmetric then a direct and elementary 
calculation shows that X is (2,4, r^-hypercontractive with rp = min(3 -1 / 2 , 11JX112/11-XT114) and the 
constant is optimal. Therefore the random variable X^) which appears in the proof of Theorem 14.81 
is (2,4, min(p 1 / 4 , 3 _1 / 2 ))-hypercontractive if X is the ±1 Rademacher ensemble; this can be used 
to get a smaller value for K' if p is close to 1. 

B Properties of T p (/i) 

Sheppard’s Formula m gives the value of r p (l/2): 

Theorem B.l T p ( 1/2) = | ^ arcsin p. 

For fixed p, the asymptotics of F p (/i) as p —> 0 can be determined precisely; calculations of this 
nature appear in EL HI- 

Theorem B.2 As p —> 0, 

r,0u) ~ f* 2/(1+ ' >l (4*ln(l h*))- p/{l+ <‘ ) 

Proof: This follows from, e.g., Lemma 11.1 of El; although we have p > 0 as opposed to p < 0 
as in M, the formula there can still be seen to hold when x = y (in their notation). □ 

Lemma B.3 For all 0 < p < 1 and all 0 < p\ < p 2 < 1, 

r p (^2) - r p (/xi) < 2(p 2 - Mi)- 

Proof: Let X and Y be p-correlated Gaussians and write L = T -1 (/!?.)• Then 

f p (m 2 ) - r p (/X!) = V[X <t 2 ,Y <t 2 ]-P[X <h,Y <tt] 

< 2P[ti < X < t 2 \ = 2,(p 2 — p\). 

□ 

Lemma B.4 Let 0 < p < 1 and 0 < p\ < p 2 < l, and write I 2 = (T p2 (p) — p 2 )/p 2 . Then I 2 < p 
and 

r M - r p » < 4 ■ 1 + H»/h ) . h . {p2 _ pi) 

1 - P2 
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Proof: Let 


d = 1 + ln(/VJ 2 ) 

1-/02 

The proof will rely on the fact that T p (p) is a convex function of p. This implies in particular 
that I 2 < p. Moreover, by the Mean Value Theorem it suffices to show that the derivative at p 2 
is at most 4d/2- If we write the Hermite polynomial expansion of as Xp{ x ) = Yhi c i^i{ x )i then 
Tp(n) = c iP l > and thus 




P=P2 


J2 iC iPz 1 - i( %P ‘‘2 1+ iC iPz 1 - 

i> 1 l<i<d+l i>d+l 


(31) 


We will complete the proof by showing that both terms in © are at most 2dl2 . The first term 
is visibly at most (d + l )/2 < 2dl 2 . As for the second term, the quantity ipIff 1 decreases for 
i > P 2 /(1 — P 2 )- Since d + 1 > (2 — p)/(l — p) > /(l — P 2 ) the second term is therefore at most 

(d + l)/? 2^2 < {d + 1 )P 2 P- But 

P d 2 < < I 2 /p 

since p 1 ^ 1 P2 ' } < 1/e for all p 2 - Thus the second term of (1311) is also at most {d + 1)12 < ‘MI 2 , as 
needed. □ 

Using the fact that —I 2 ln /2 is a bounded quantity we obtain: 

Corollary B.5 For all 0 < p < 1 and 0 < p < 1, if 0 < 5 < (1 — p)/2 then 

T p+s (p)-T p (p)<^-.S. 


C Proof of Theorem 14.51 

Proof: The proof is essentially the same as the proof of the “upper bound” part of the proof 
of Theorem 4.1 in m • By way of contradiction, suppose the upper bound on 5 holds and yet 
P [T p f > 1 — 5}] > e. Let g be the indicator function of a subset of {x : T p f(x) > 1 — <5} whose 
measure is e. 

Let h = 1 {/<&}, where b = 1/2 + p/2. By a Markov argument, 

P = E[/] > (1 - E [h])b =► B[h] > 1 - E [f]/b = 

By another Markov argument, whenever g(x) = 1 we have 

T p { 1 -f)<8 => T p (h(l — b)) < 8 => T p h < 

Thus 

E \gT p h] < (32) 

1 fl 

But by Corollary 3.5 in |I2, (itself a simple corollary of the reverse Bonami-Beckner inequality), 

E [gT p h] > e • e (VS+p) 2 /(i-P 2 ) ; ( 33 ) 

where a = log(l/E[/i])/log(l/e). (In Gaussian space, this fact can also be proven using Borell’s 
Corollary 14.31 1 Note that since E[L] > (1 — p)/(l + p) we get a < 0(c(p)/ log(l/e)), which is also 
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at most 1 since we assume e < 1 — /i. Therefore the exponent (y/a + p ) 2 is p 2 + 0(y/a). Now (1331) 
implies 


E [gT p h] > e ■ e^/B-p 2 ) ■ e 0< V c (m)/ ^gBAO/B-p)) = e . e P 2 /( 1 -p 2 )+°( K )_ 
Combining m and El yields 

j • e P 2 /( 1 ”P 2 )+°( K ) = e 1 °g( 2 /( 1 -p))/ 1 °g( 1 A) e P 2 /( 1 -p 2 )+o( K ) _ £ p 2 /( 1 -p 2 )+0(k) ) 


(34) 


a contradiction. □ 


D Proof of Corollary 15.31 

Proof: We define f n by setting 1 < u < n to be the odd integer nearest to tyfn (where t is the 
number chosen in Theorem 15.31) and then taking 

j. , s f 1 if Id G [1, itl or Id G [—n,—(it + 2)1, 

/ " W= \-l if |x| e [u + 2,n] or |x|e[-«,-l], 

where |x| denotes x i- This is clearly a completely symmetric odd function. It is well known 
that for any boolean function, Inf,;(/ n ) equals the expected number of pivotal bits for f n in a 

random input. One can easily see that this is 0(y/n). Thus each of / n ’s coordinates has influence 
0(l/y/n), by symmetry. 


Let p(n , s ) denote the probability that the sum of n independent ±1 Bernoulli random variables 
is exactly s, so 


n 


p(n, s ) = 2~ 

and for a set S of integers let p(n, S ) denote X^sesP( n > s )• 


2 n + 2 S 


By symmetry all of / n ’s Fourier coefficients at level d have the same value; we will write f n (d ) 
for this quantity. Since f n is odd, f n ( 0) = f n { 2) = 0. By explicit calculation, we have 


fn{ 1) = P{n ~ 1,0) - 2 p{n - 1, u + 1) 


fn( 3) = ^(p(n-3,{-2,2})-2p(n-3,0)) 

~\(p{n ~ 3, {±(u - 1),±(u + 3)}) - 2 p(n - 3, {±(tt + 1)})) 

1 . In — 1) — (u + 1) 2 . 

=- -pn-3,0 +2- - -d- - ’ p(n - 3,u + 1), 

n — 1 (n — \) z — (u + i y 

where the last equality is by explicit conversion to factorials and simplification. Using p(n, ty/n) = 
(1 + o{\))yj2/'Ke~ t I 2 n~ 1//2 as n —> oo, we conclude 

fn{ 1) ~ V / 2A( 1 - 2e _ “ 2/2 ), /n( 3) ~ y r 2/Tr(-l + 2(1 - r t 2 )e~ u2/2 )n~ 3/2 . 

But the weight of f n at level 1 is n-f n ( l) 2 and the weight of f n at level 3 is ([)) -f n ( 3) 2 ~ (n 3 /6)/ n (3) 2 ; 
thus the above imply EU|) from Theorem 15.21 in the limit n —»• oo and the proof is complete. □ 
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